Pruning Decision Trees with Misclassi cation Costs 08 - FEB - 1998

نویسنده

  • Carla E. Brodley
چکیده

We describe an experimental study of pruning methods for decision tree classi ers in two learning situations: minimizing loss and probability estimation. In addition to the two most common methods for error minimization, CART's cost-complexity pruning and C4.5's errorbased pruning, we study the extension of cost-complexity pruning to loss and two pruning variants based on Laplace corrections. We perform an empirical comparison of these methods and evaluate them with respect to the following three criteria: loss, mean-squared-error (MSE), and log-loss. We provide a bias-variance decomposition of the MSE to show how pruning a ects the bias and variance. We found that applying the Laplace correction to estimate the probability distributions at the leaves was bene cial to all pruning methods, both for loss minimization and for estimating probabilities. Unlike in error minimization, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in terms of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of 10. While no method dominated others on all datasets, even for the same domain di erent pruning mechanisms are better for different loss matrices. We show this last result using Receiver Operating Characteristics (ROC) curves. 1 Pruning Decision Trees Decision trees are a widely used symbolic modeling technique for classi cation tasks in machine learning. The most common approach to constructing decision tree classi ers is to grow a full tree and prune it back. Pruning is desirable because the tree that is grown may over t the data by inferring more structure than is justi ed by the training set. Speci cally, if there are no con icting instances, the training set error of a fully built tree is zero, while the true error is likely to be larger. To combat this over tting problem, the tree is pruned back with the goal of identifying the tree with the lowest error rate on previously unobserved instances, breaking ties in favor of smaller trees (Breiman, Friedman, Olshen & Stone 1984, Quinlan 1993). Several pruning methods have been introduced in the literature, including cost-complexity pruning (Breiman et al. 1984), reduced error pruning and pessimistic pruning (Quinlan 1987), error-based pruning (Quinlan 1993), penalty pruning (Mansour 1997), and MDL pruning (Quinlan & Rivest 1989, Mehta, Rissanen & Agrawal 1995, Wallace & Patrick 1993). Esposito, Malerba & Semeraro (1995a, 1995b) have compared several of these pruning algorithms for error minimization.Oates & Jensen (1997) showed that most pruning algorithms create trees that are larger than necessary if error minimization is the evaluation criterion. Our objective in this paper is di erent than the above-mentioned studies. Instead of pruning to minimize error, we aim to study pruning algorithms with two related goals: loss minimization and probability estimation. Historically, most pruning algorithms have been developed to minimize the expected error rate of the decision tree, assuming that classi cation errors have the same unit cost. However in many practical applications one has a loss matrix associated with classi cation errors (Turney 1997, Fawcett & Provost 1996, Kubat, Holte & Matwin 1997, Danyluk & Provost 1993). In such cases, it may be desirable to prune the tree with respect to the loss matrix or to prune in order to optimize the accuracy of a probability distribution given for each instance. A probability distribution may be used to adjust the prediction to minimize the expected loss or to supply a con dence level associated with the prediction; in addition, a probability distribution may also be used to generate a lift curve (Berry & Lino 1997). Pruning for loss minimization or for probability estimation can lead to di erent pruning behavior than does pruning for error minimization. Figure 1 (left) shows an example where the subtree should be pruned by error-minimization algorithms because the number of errors stays the same (5/100) if the subtree is pruned to a leaf. If the problem has an associated loss matrix that speci es that the cost of misclassifying someone who is sick as healthy is ten times as costly as classifying someone who is healthy as sick, then we don't want the pruning algorithm to prune this subtree. For this loss matrix, pruning the tree leads to a loss of 50, whereas retaining the tree leads to a loss of 5 (the left hand leaf would classify instances as sick to minimize the expected loss). Figure 1 (right) illustrates the reverse situation: error-based pruning would retain the subtree, whereas cost-based pruning would prune the subtree. Given the same loss ma-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting Trees for Cost-Sensitive Classi cations

This paper explores two boosting techniques for cost-sensitive tree classi cations in the situation where misclassi cation costs change very often. Ideally, one would like to have only one induction, and use the induced model for di erent misclassi cation costs. Thus, it demands robustness of the induced model against cost changes. Combining multiple trees gives robust predictions against this ...

متن کامل

MDL-Based Decision Tree Pruning

This paper explores the application of the Min imum Description Length principle for pruning decision trees We present a new algorithm that intuitively captures the primary goal of reduc ing the misclassi cation error An experimental comparison is presented with three other prun ing algorithms The results show that the MDL pruning algorithm achieves good accuracy small trees and fast execution ...

متن کامل

Appears in Ecml-98 as a Research Note a Longer Version Is Available as Ece Tr 98-3, Purdue University Pruning Decision Trees with Misclassiication Costs 1 Pruning Decision Trees

We describe an experimental study of pruning methods for decision tree classiiers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical com...

متن کامل

Pruning Regression Trees with MDL

Pruning is a method for reducing the error and complexity of induced trees. There are several approaches to pruning decision trees, while regression trees have attracted less attention. We propose a method for pruning regression trees based on the sound foundations of the MDL principle. We develop coding schemes for various constructs and models in the leaves and empirically test the new method...

متن کامل

Generalization in Decision Trees and DNF: Does Size Matter?

Recent theoretical results for pattern classi cation with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassi cation probability that do not depend on the size of the classi er, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we show that these techniques can be more wid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015